Rapid Catalytic Template Searching as an Enzyme Function Prediction Procedure
نویسندگان
چکیده
We present an enzyme protein function identification algorithm, Catalytic Site Identification (CatSId), based on identification of catalytic residues. The method is optimized for highly accurate template identification across a diverse template library and is also very efficient in regards to time and scalability of comparisons. The algorithm matches three-dimensional residue arrangements in a query protein to a library of manually annotated, catalytic residues--The Catalytic Site Atlas (CSA). Two main processes are involved. The first process is a rapid protein-to-template matching algorithm that scales quadratically with target protein size and linearly with template size. The second process incorporates a number of physical descriptors, including binding site predictions, in a logistic scoring procedure to re-score matches found in Process 1. This approach shows very good performance overall, with a Receiver-Operator-Characteristic Area Under Curve (AUC) of 0.971 for the training set evaluated. The procedure is able to process cofactors, ions, nonstandard residues, and point substitutions for residues and ions in a robust and integrated fashion. Sites with only two critical (catalytic) residues are challenging cases, resulting in AUCs of 0.9411 and 0.5413 for the training and test sets, respectively. The remaining sites show excellent performance with AUCs greater than 0.90 for both the training and test data on templates of size greater than two critical (catalytic) residues. The procedure has considerable promise for larger scale searches.
منابع مشابه
A New RSTB Invariant Image Template Matching Based on Log-Spectrum and Modified ICA
Template matching is a widely used technique in many of image processing and machine vision applications. In this paper we propose a new as well as a fast and reliable template matching algorithm which is invariant to Rotation, Scale, Translation and Brightness (RSTB) changes. For this purpose, we adopt the idea of ring projection transform (RPT) of image. In the proposed algorithm, two novel s...
متن کاملGASS-WEB: a web server for identifying enzyme active sites based on genetic algorithms
Enzyme active sites are important and conserved functional regions of proteins whose identification can be an invaluable step toward protein function prediction. Most of the existing methods for this task are based on active site similarity and present limitations including performing only exact matches on template residues, template size restraints, despite not being capable of finding inter-d...
متن کاملCRHunter: integrating multifaceted information to predict catalytic residues in enzymes
A variety of algorithms have been developed for catalytic residue prediction based on either feature- or template-based methodology. However, no studies have systematically compared these two strategies and further considered whether their combination could improve the prediction performance. Herein, we developed an integrative algorithm named CRHunter by simultaneously using the complementarit...
متن کاملPrediction of Fe-Co-Mn/MgO Catalytic Activity in Fischer-Tropsch Synthesis Using Nu-support Vector Regression
Support vector regression (SVR) is a learning method based on the support vector machine (SVM) that can be used for curve fitting and function estimation. In this paper, the ability of the nu-SVR to predict the catalytic activity of the Fischer-Tropsch (FT) reaction is evaluated and the result is compared with two other prediction techniques including: multilayer perceptron (MLP) and subtractiv...
متن کاملBioLiP: a semi-manually curated database for biologically relevant ligand–protein interactions
BioLiP (http://zhanglab.ccmb.med.umich.edu/BioLiP/) is a semi-manually curated database for biologically relevant ligand-protein interactions. Establishing interactions between protein and biologically relevant ligands is an important step toward understanding the protein functions. Most ligand-binding sites prediction methods use the protein structures from the Protein Data Bank (PDB) as templ...
متن کامل